On the reliability of information retrieval metrics based on graded relevance
نویسنده
چکیده
This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall’s rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based gradedrelevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values. 2006 Elsevier Ltd. All rights reserved.
منابع مشابه
Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملThe Role of the FUM Students' Demographic Features in the Relevance Judgment Scores of Their Information Retrieval Results in Search Engines
In order to design user-friendly information retrieval systems, it is important to pay attention to characteristics of users. Therefore, the aim of the present study is to investigate the role of demographic variables of users during their search in search engines. Method: This is an applied study in terms of purpose, which was done by the evaluation method. To conduct the research, firstly,...
متن کاملOn Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance
Large-scale information retrieval evaluation efforts such as TREC and NTCIR have tended to adhere to binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 Crosslingual Task has finally started adopting graded-relevance metrics, though only as additional metrics. This paper compares three existing graded-relevance metrics that were mentioned in...
متن کاملControlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance
Large-scale information retrieval evaluation efforts such as TREC and NTCIR have always used binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control ...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 43 شماره
صفحات -
تاریخ انتشار 2007